{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(serving-ml-dl-models)=\n",
    "# Serving pre-trained ML/DL models\n",
    "\n",
    "This notebook demonstrate how to serve standard ML/DL models using **MLRun Serving**.\n",
    "\n",
    "Make sure you went over the basics in MLRun [**Quick Start Tutorial**](https://docs.mlrun.org/en/latest/tutorial/01-mlrun-basics.html).\n",
    "\n",
    "\n",
    "MLRun serving can produce managed real-time serverless pipelines from various tasks, including MLRun models or standard model files.\n",
    "The pipelines use the Nuclio real-time serverless engine, which can be deployed anywhere.\n",
    "[Nuclio](https://nuclio.io/) is a high-performance open-source \"serverless\" framework that's focused on data, I/O, and compute-intensive workloads.\n",
    "\n",
    "\n",
    "MLRun serving supports advanced real-time data processing and model serving pipelines.<br>\n",
    "For more details and examples, see the [MLRun serving pipelines](https://docs.mlrun.org/en/latest/serving/serving-graph.html) documentation.\n",
    "\n",
    "Tutorial steps:\n",
    "- [**Using pre-built MLRun serving classes and images**](#pre-built-serving)\n",
    "- [**Create and test the serving function**](#create-function)\n",
    "- [**Deploy the serving function**](#deploy-serving)\n",
    "- [**Build a custom serving class**](#custom-class)\n",
    "- [**Building advanced model serving graph**](#serving=graph)\n",
    "\n",
    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/OUjOus4dZfw\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" allowfullscreen></iframe>\n",
    "\n",
    "## MLRun installation and configuration\n",
    "\n",
    "Before running this notebook make sure the `mlrun` package is installed (`pip install mlrun`) and that you have configured the access to MLRun service. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install MLRun if not installed, run this only once. Restart the notebook after the install!\n",
    "%pip install mlrun"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Get or create a new project**\n",
    "\n",
    "You should create, load or use (get) an [MLRun Project](https://docs.mlrun.org/en/latest/projects/project.html). The `get_or_create_project()` method tries to load the project from the MLRun DB. If the project does not exist, it creates a new one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import mlrun\n",
    "\n",
    "project = mlrun.get_or_create_project(\"tutorial\", context=\"./\", user_project=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"pre-built-serving\"></a>\n",
    "## Using pre-built MLRun serving classes and images\n",
    "\n",
    "MLRun contains built-in serving functionality for the major ML/DL frameworks (Scikit-Learn, TensorFlow.Keras, ONNX, XGBoost, LightGBM, and PyTorch). In addition, MLRun provides a few container images with the required ML/DL packages pre-installed.\n",
    "\n",
    "You can overwrite the packages in the images, or provide your own image. (You just need to make sure that the `mlrun` package is installed in it.)\n",
    "\n",
    "The following table specifies, for each framework, the relevant pre-integrated image and the corresponding MLRun `ModelServer` serving class:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "|framework       |image          |serving class                               |\n",
    "|:---------------|:--------------|:-------------------------------------------|\n",
    "|SciKit-Learn    |mlrun/mlrun    |mlrun.frameworks.sklearn.SklearnModelServer |\n",
    "|TensorFlow.Keras|mlrun/ml-models|mlrun.frameworks.tf_keras.TFKerasModelServer|\n",
    "|ONNX            |mlrun/ml-models|mlrun.frameworks.onnx.ONNXModelServer       |\n",
    "|XGBoost         |mlrun/ml-models|mlrun.frameworks.xgboost.XGBoostModelServer | \n",
    "|LightGBM        |mlrun/ml-models|mlrun.frameworks.lgbm.LGBMModelServer       |\n",
    "|PyTorch         |mlrun/ml-models|mlrun.frameworks.pytorch.PyTorchModelServer |\n",
    "\n",
    "> For GPU support use the `mlrun/ml-models-gpu` image (adding GPU drivers and support)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Example using SKlearn and TF Keras models**\n",
    "\n",
    "See how to specify the parameters in the following two examples. These use standard pre-trained models (using the iris dataset) stored in MLRun samples repository. (You can use your own models instead.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "models_dir = mlrun.get_sample_path(\"models/serving/\")\n",
    "\n",
    "# We choose the correct model to avoid pickle warnings\n",
    "import sys\n",
    "\n",
    "suffix = (\n",
    "    mlrun.__version__.split(\"-\")[0].replace(\".\", \"_\")\n",
    "    if sys.version_info[1] > 7\n",
    "    else \"3.7\"\n",
    ")\n",
    "\n",
    "framework = \"sklearn\"  # change to 'keras' to try the 2nd option\n",
    "kwargs = {}\n",
    "if framework == \"sklearn\":\n",
    "    serving_class = \"mlrun.frameworks.sklearn.SklearnModelServer\"\n",
    "    model_path = models_dir + f\"sklearn-{suffix}.pkl\"\n",
    "    image = \"mlrun/mlrun\"\n",
    "else:\n",
    "    serving_class = \"mlrun.frameworks.tf_keras.TFKerasModelServer\"\n",
    "    model_path = models_dir + \"keras.h5\"\n",
    "    image = \"mlrun/ml-models\"  # or mlrun/ml-models-gpu when using GPUs\n",
    "    kwargs[\"labels\"] = {\"model-format\": \"h5\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Log the model\n",
    "\n",
    "The model and its metadata are first registered in MLRun's **Model Registry**. Use the `log_model()` method to specify the model files and metadata (metrics, schema, parameters, etc.)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_object = project.log_model(f\"{framework}-model\", model_file=model_path, **kwargs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"create-function\"></a>\n",
    "## Create and test the serving function \n",
    "\n",
    "Create a new **`serving`** function, specify its `name` and the correct `image` (with your desired framework).\n",
    "\n",
    "> If you want to add specific packages to the base image, specify the `requirements` attribute, example:\n",
    "> \n",
    "> ```python\n",
    "> serving_fn = mlrun.new_function(\"serving\", image=image, kind=\"serving\", requirements=[\"tensorflow==2.8.1\"])\n",
    "> ```\n",
    "\n",
    "The following example uses a basic topology of a model `router` and adds a single model behind it. (You can add multiple models to the same function.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.43.0 (0)\n",
       " -->\n",
       "<!-- Title: mlrun&#45;flow Pages: 1 -->\n",
       "<svg width=\"302pt\" height=\"52pt\"\n",
       " viewBox=\"0.00 0.00 302.29 52.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 48)\">\n",
       "<title>mlrun&#45;flow</title>\n",
       "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-48 298.29,-48 298.29,4 -4,4\"/>\n",
       "<!-- _start -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>_start</title>\n",
       "<polygon fill=\"lightgrey\" stroke=\"black\" points=\"38.55,-4.05 40.7,-4.15 42.83,-4.3 44.92,-4.49 46.98,-4.74 48.99,-5.03 50.95,-5.36 52.84,-5.75 54.66,-6.18 56.4,-6.65 58.06,-7.16 59.63,-7.71 61.11,-8.31 62.49,-8.94 63.76,-9.61 64.93,-10.31 65.99,-11.04 66.93,-11.8 67.77,-12.59 68.48,-13.41 69.09,-14.25 69.58,-15.11 69.95,-15.99 70.21,-16.89 70.36,-17.8 70.4,-18.72 70.33,-19.65 70.16,-20.59 69.89,-21.53 69.53,-22.47 69.07,-23.41 68.52,-24.35 67.89,-25.28 67.18,-26.2 66.4,-27.11 65.55,-28.01 64.63,-28.89 63.65,-29.75 62.62,-30.59 61.53,-31.41 60.4,-32.2 59.23,-32.96 58.02,-33.69 56.78,-34.39 55.5,-35.06 54.2,-35.69 52.88,-36.29 51.53,-36.84 50.17,-37.35 48.79,-37.82 47.4,-38.25 46,-38.64 44.59,-38.97 43.17,-39.26 41.75,-39.51 40.32,-39.7 38.89,-39.85 37.45,-39.95 36.02,-40 34.58,-40 33.15,-39.95 31.71,-39.85 30.28,-39.7 28.85,-39.51 27.43,-39.26 26.01,-38.97 24.6,-38.64 23.2,-38.25 21.81,-37.82 20.43,-37.35 19.07,-36.84 17.72,-36.29 16.4,-35.69 15.1,-35.06 13.82,-34.39 12.58,-33.69 11.37,-32.96 10.2,-32.2 9.07,-31.41 7.98,-30.59 6.95,-29.75 5.97,-28.89 5.05,-28.01 4.2,-27.11 3.42,-26.2 2.71,-25.28 2.08,-24.35 1.53,-23.41 1.07,-22.47 0.71,-21.53 0.44,-20.59 0.27,-19.65 0.2,-18.72 0.24,-17.8 0.39,-16.89 0.65,-15.99 1.02,-15.11 1.51,-14.25 2.11,-13.41 2.83,-12.59 3.66,-11.8 4.61,-11.04 5.67,-10.31 6.84,-9.61 8.11,-8.94 9.49,-8.31 10.97,-7.71 12.54,-7.16 14.2,-6.65 15.94,-6.18 17.76,-5.75 19.65,-5.36 21.61,-5.03 23.62,-4.74 25.68,-4.49 27.77,-4.3 29.9,-4.15 32.05,-4.05 34.22,-4 36.38,-4 38.55,-4.05\"/>\n",
       "<text text-anchor=\"middle\" x=\"35.3\" y=\"-18.3\" font-family=\"Times,serif\" font-size=\"14.00\">start</text>\n",
       "</g>\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title></title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"164.6,-14.54 164.6,-29.46 148.78,-40 126.42,-40 110.6,-29.46 110.6,-14.54 126.42,-4 148.78,-4 164.6,-14.54\"/>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"168.6,-12.4 168.6,-31.6 149.99,-44 125.2,-44 106.6,-31.6 106.6,-12.4 125.2,0 149.99,0 168.6,-12.4\"/>\n",
       "</g>\n",
       "<!-- _start&#45;&gt; -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>_start&#45;&gt;</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M69.98,-22C78.48,-22 87.7,-22 96.49,-22\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"96.55,-25.5 106.55,-22 96.55,-18.5 96.55,-25.5\"/>\n",
       "</g>\n",
       "<!-- sklearn -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>sklearn</title>\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"249.45\" cy=\"-22\" rx=\"44.69\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"249.45\" y=\"-18.3\" font-family=\"Times,serif\" font-size=\"14.00\">sklearn</text>\n",
       "</g>\n",
       "<!-- &#45;&gt;sklearn -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>&#45;&gt;sklearn</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M168.61,-22C176.56,-22 185.39,-22 194.19,-22\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"194.46,-25.5 204.46,-22 194.46,-18.5 194.46,-25.5\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.graphs.Digraph at 0x7f839aafc4c0>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "serving_fn = mlrun.new_function(\"serving\", image=image, kind=\"serving\", requirements={})\n",
    "serving_fn.add_model(\n",
    "    framework, model_path=model_object.uri, class_name=serving_class, to_list=True\n",
    ")\n",
    "\n",
    "# Plot the serving topology input -> router -> model\n",
    "serving_fn.plot(rankdir=\"LR\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Simulate the model server locally (using the mock_server)**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a mock server that represents the serving pipeline\n",
    "server = serving_fn.to_mock_server()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>\n",
    "\n",
    "**Test the mock model server endpoint**\n",
    "    \n",
    "- List the served models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'models': ['sklearn']}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "server.test(\"/v2/models/\", method=\"GET\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Infer using test data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "X does not have valid feature names, but RandomForestClassifier was fitted with feature names\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'id': '826608653e95452b9ac48fcca1ab8c47',\n",
       " 'model_name': 'sklearn',\n",
       " 'outputs': [0, 2]}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sample = {\"inputs\": [[5.1, 3.5, 1.4, 0.2], [7.7, 3.8, 6.7, 2.2]]}\n",
    "server.test(path=f\"/v2/models/{framework}/infer\", body=sample)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See more API options and parameters in [Model serving API](https://docs.mlrun.org/en/latest/serving/model-api.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"deploy-serving\"></a>\n",
    "## Deploy the serving function\n",
    "\n",
    "Deploy the serving function and use `invoke` to test it with the provided `sample`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> 2023-03-13 08:33:34,552 [info] Starting remote function deploy\n",
      "2023-03-13 08:33:34  (info) Deploying function\n",
      "2023-03-13 08:33:34  (info) Building\n",
      "2023-03-13 08:33:34  (info) Staging files and preparing base images\n",
      "2023-03-13 08:33:34  (info) Building processor image\n",
      "2023-03-13 08:34:29  (info) Build complete\n",
      "2023-03-13 08:34:39  (info) Function deploy complete\n",
      "> 2023-03-13 08:34:45,922 [info] successfully deployed function: {'internal_invocation_urls': ['nuclio-tutorial-yonis-serving.default-tenant.svc.cluster.local:8080'], 'external_invocation_urls': ['tutorial-yonis-serving-tutorial-yonis.default-tenant.app.vmdev30.lab.iguazeng.com/']}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "DeployStatus(state=ready, outputs={'endpoint': 'http://tutorial-yonis-serving-tutorial-yonis.default-tenant.app.vmdev30.lab.iguazeng.com/', 'name': 'tutorial-yonis-serving'})"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "project.deploy_function(serving_fn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> 2023-03-13 08:34:46,009 [info] invoking function: {'method': 'POST', 'path': 'http://nuclio-tutorial-yonis-serving.default-tenant.svc.cluster.local:8080/v2/models/sklearn/infer'}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'id': 'b699f7e6-2d3b-4fa4-9534-fa6b9fa3f423',\n",
       " 'model_name': 'sklearn',\n",
       " 'outputs': [0, 2]}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "serving_fn.invoke(path=f\"/v2/models/{framework}/infer\", body=sample)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"custom-class\"></a>\n",
    "## Build a custom serving class\n",
    "\n",
    "Model serving classes implement the full model serving functionality, which include loading models, pre- and post-processing, prediction, explainability, and model monitoring.\n",
    "\n",
    "Model serving classes must inherit from `mlrun.serving.V2ModelServer`, and at the minimum implement the `load()` (download the model file(s) and load the model into memory) and `predict()` (accept request payload and return prediction/inference results) methods.\n",
    "\n",
    "For more detailed information on custom serving classes, see [Build your own model serving class](https://docs.mlrun.org/en/latest/serving/custom-model-serving-class.html).\n",
    "\n",
    "The following code demonstrates a minimal scikit-learn (a.k.a. sklearn) serving-class implementation:\n",
    "\n",
    "\n",
    "```python\n",
    "from cloudpickle import load\n",
    "import numpy as np\n",
    "from typing import List\n",
    "import mlrun\n",
    "\n",
    "class ClassifierModel(mlrun.serving.V2ModelServer):\n",
    "    def load(self):\n",
    "        \"\"\"load and initialize the model and/or other elements\"\"\"\n",
    "        model_file, extra_data = self.get_model('.pkl')\n",
    "        self.model = load(open(model_file, 'rb'))\n",
    "\n",
    "    def predict(self, body: dict) -> List:\n",
    "        \"\"\"Generate model predictions from sample.\"\"\"\n",
    "        feats = np.asarray(body['inputs'])\n",
    "        result: np.ndarray = self.model.predict(feats)\n",
    "        return result.tolist()\n",
    "```\n",
    "\n",
    "In order to create a function that incorporates the code of the new class (in `serving.py` ) use `code_to_function`:\n",
    "\n",
    "```python\n",
    "serving_fn = mlrun.code_to_function('serving', filename='serving.py', kind='serving',image='mlrun/mlrun')\n",
    "serving_fn.add_model('my_model',model_path=model_file, class_name='ClassifierModel')\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"serving=graph\"></a>\n",
    "## Build an advanced model serving graph\n",
    "\n",
    "MLRun graphs enable building and running DAGs (directed acyclic graphs). Graphs are composed of individual steps. \n",
    "The first graph element accepts an `Event` object, transforms/processes the event and passes the result to the next step\n",
    "in the graph, and so on. The final result can be written out to a destination (file, DB, stream, etc.) or returned back to the caller \n",
    "(one of the graph steps can be marked with `.respond()`). \n",
    "\n",
    "The serving graphs can be composed of [pre-defined graph steps](../serving/available-steps.html), block-type elements (model servers, routers, ensembles, \n",
    "data readers and writers, data engineering tasks, validators, etc.), [custom steps](..serving/writing-custom-steps.html), or from native python \n",
    "classes/functions. A graph can have data processing steps, model ensembles, model servers, post-processing, etc. \n",
    "Graphs can auto-scale and span multiple function containers (connected through streaming protocols).\n",
    "\n",
    "See the [Advanced Model Serving Graph Notebook Example](../serving/graph-example.htm)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Done!\n",
    "\n",
    "Congratulations! You've completed Part 3 of the MLRun getting-started tutorial.\n",
    "Proceed to [**Part 4: ML Pipeline**](04-pipeline.html) to learn how to create an automated pipeline for your project."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
